Counterfactual Regret Minimization in Sequential Security Games

نویسندگان

Viliam Lisý

Trevor Davis

Michael H. Bowling

چکیده

Many real world security problems can be modelled as finite zero-sum games with structured sequential strategies and limited interactions between the players. An abstract class of games unifying these models are the normal-form games with sequential strategies (NFGSS). We show that all games from this class can be modelled as well-formed imperfect-recall extensiveform games and consequently can be solved by counterfactual regret minimization. We propose an adaptation of the CFR algorithm for NFGSS and compare its performance to the standard methods based on linear programming and incremental game generation. We validate our approach on two security-inspired domains. We show that with a negligible loss in precision, CFR can compute a Nash equilibrium with five times less computation than its competitors. Game theory has been recently used to model many real world security problems, such as protecting airports (Pita et al. 2008) or airplanes (Tsai et al. 2009) from terrorist attacks, preventing fare evaders form misusing public transport (Yin et al. 2012), preventing attacks in computer networks (Durkota et al. 2015), or protecting wildlife from poachers (Fang, Stone, and Tambe 2015). Many of these security problems are sequential in nature. Rather than a single monolithic action, the players’ strategies are formed by sequences of smaller individual decisions. For example, the ticket inspectors make a sequence of decisions about where to check tickets and which train to take; a network administrator protects the network against a sequence of actions an attacker uses to penetrate deeper into the network. Sequential decision making in games has been extensively studied from various perspectives. Recent years have brought significant progress in solving massive imperfectinformation extensive-form games with a focus on the game of poker. Counterfactual regret minimization (Zinkevich et al. 2008) is the family of algorithms that has facilitated much of this progress, with a recent incarnation (Tammelin et al. 2015) essentially solving for the first time a variant of poker commonly played by people (Bowling et al. 2015). However, there has not been any transfer of these results to research on real world security problems. Copyright c © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. We focus on an abstract class of sequential games that can model many sequential security games, such as games taking place in physical space that can be discretized as a graph. This class of games is called normal-form games with sequential strategies (NFGSS) (Bosansky et al. 2015) and it includes, for example, existing game theoretic models of ticket inspection (Jiang et al. 2013), border patrolling (Bosansky et al. 2015), and securing road networks (Jain et al. 2011). In this work we formally prove that any NFGSS can be modelled as a slightly generalized chance-relaxed skew well-formed imperfect-recall game (CRSWF) (Lanctot et al. 2012; Kroer and Sandholm 2014), a subclass of extensiveform games with imperfect recall in which counterfactual regret minimization is guaranteed to converge to the optimal strategy. We then show how to adapt the recent variant of the algorithm, CFR, directly to NFGSS and present experimental validation on two distinct domains modelling search games and ticket inspection. We show that CFR is applicable and efficient in domains with imperfect recall that are substantially different from poker. Moreover, if we are willing to sacrifice a negligible degree of approximation, CFR can find a solution substantially faster than methods traditionally used in research on security games, such as formulating the game as a linear program (LP) and incrementally building the game model by double oracle methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monte Carlo Sampling for Regret Minimization in Extensive Games

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome samplin...

متن کامل

Counterfactual Regret Minimization for Decentralized Planning

Regret minimization is an effective technique for almost surely producing Nash equilibrium policies in coordination games in the strategic form. Decentralized POMDPs offer a realistic model for sequential coordination problems, but they yield doubly exponential sized games in the strategic form. Recently, counterfactual regret has offered a way to decompose total regret along a (extensive form)...

متن کامل

Regret Minimization in Multiplayer Extensive Games

The counterfactual regret minimization (CFR) algorithm is state-of-the-art for computing strategies in large games and other sequential decisionmaking problems. Little is known, however, about CFR in games with more than 2 players. This extended abstract outlines research towards a better understanding of CFR in multiplayer games and new procedures for computing even stronger multiplayer strate...

متن کامل

Regret Minimization in Games with Incomplete Information

Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...

متن کامل